[Feat] Enable VAE parallel in HunyuanImage3 by Fishermanykx · Pull Request #3091 · vllm-project/vllm-omni

Fishermanykx · 2026-04-24T03:02:12Z

Summary

Enable VAE parallel support in HunyuanImage3.

Current changes:

add a distributed Hunyuan VAE wrapper at vllm_omni/diffusion/distributed/autoencoders/autoencoder_kl_hunyuan.py
wire HunyuanImage3Pipeline to use the distributed autoencoder wrapper
remove the NPU fused MoE init hook in vllm_omni/platforms/npu/models/hunyuan_fused_moe.py

unified deploy yaml in #3172

Validation

static checks only so far (py_compile, diff checks)
runtime validation is still pending

Test Plan

Tested on 4xAscend NPU

server

vllm serve $model --omni --port "8031" \
    --log-stats \
    --stage-configs-path "vllm_omni/platforms/npu/stage_configs/hunyuan_image3_t2i.yaml"

vae_patch_parallel_size is set to 4

client

curl -X POST http://localhost:8031/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": 
    "A cinematic medium shot captures a single Asian woman seated on a chair within a dimly lit room, creating an intimate and theatrical atmosphere. The composition is focused on the subject, rendered with rich colors and intricate textures that evoke a nostalgic and moody feeling.\n\nThe primary subject is a young Asian woman with a thoughtful and expressive countenance, her gaze directed slightly away from the camera. She is seated in a relaxed yet elegant posture on an ornate, vintage armchair. The chair is upholstered in a deep red velvet, its fabric showing detailed, intricate textures and slight signs of wear. She wears a simple, elegant dress in a dark teal hue, the material catching the light in a way that reveals its fine-woven texture. Her skin has a soft, matte quality, and the light delicately models the contours of her face and arms.\n\nThe surrounding room is characterized by its vintage decor, which contributes to the historic and evocative mood. In the immediate background, partially blurred due to a shallow depth of field consistent with a f/2.8 aperture, the wall is covered with wallpaper featuring a subtle, damask pattern. The overall color palette is a carefully balanced interplay of deep teal and rich red hues, creating a visually compelling and cohesive environment. The entire scene is detailed, from the fibers of the upholstery to the subtle patterns on the wall.\n\nThe lighting is highly dramatic and artistic, defined by high contrast and pronounced shadow play. A single key light source, positioned off-camera, projects gobo lighting patterns onto the scene, casting intricate shapes of light and shadow across the woman and the back wall. These dramatic shadows create a strong scense of depth and a theatrical quality. While some shadows are deep and defined, others remain soft, gently wrapping around the subject and preventing the loss of detail in darker areas. The soft focus on the background enhances the intimate feeling, drawing all attention to the expressive subject. The overall image presents a cinematic, photorealistic photography style.",
    "num_inference_steps": 2,
    "guidance_scale": "1.0",
    "n": 1,
    "size": "1024x1024",
    "seed": 42
  }' | jq -r '.data[0].b64_json' | base64 -d > output.png

Test Result

output

VAE decode time 625.7ms -> 355ms

w/o vae parallel

w vae parallel

Fishermanykx · 2026-04-24T03:37:57Z

PTAL @gcanlin @Semmer2

hsliuustc0106 · 2026-04-24T07:30:08Z

does it work in GPU as well?
does it affect the acc?

Bounty-hunter

LGTM

BLANKETusers · 2026-05-14T08:28:30Z

Test Plan

Tested on 2xH200 GPU

VAE

python vllm-omni/examples/offline_inference/hunyuan_image3/end2end.py \
  --model tencent/HunyuanImage-3.0-Instruct \
  --modality text2img \
  --deploy-config vllm-omni/vllm_omni/deploy/hunyuan_image3_dit.yaml \
  --prompts "A cinematic medium shot captures a single Asian woman seated on a chair within a dimly lit room, creating an intimate and theatrical atmosphere. The composition is focused on the subject, rendered with rich colors and intricate textures that evoke a nostalgic and moody feeling.\n\nThe primary subject is a young Asian woman with a thoughtful and expressive countenance, her gaze directed slightly away from the camera. She is seated in a relaxed yet elegant posture on an ornate, vintage armchair. The chair is upholstered in a deep red velvet, its fabric showing detailed, intricate textures and slight signs of wear. She wears a simple, elegant dress in a dark teal hue, the material catching the light in a way that reveals its fine-woven texture. Her skin has a soft, matte quality, and the light delicately models the contours of her face and arms.\n\nThe surrounding room is characterized by its vintage decor, which contributes to the historic and evocative mood. In the immediate background, partially blurred due to a shallow depth of field consistent with a f/2.8 aperture, the wall is covered with wallpaper featuring a subtle, damask pattern. The overall color palette is a carefully balanced interplay of deep teal and rich red hues, creating a visually compelling and cohesive environment. The entire scene is detailed, from the fibers of the upholstery to the subtle patterns on the wall.\n\nThe lighting is highly dramatic and artistic, defined by high contrast and pronounced shadow play. A single key light source, positioned off-camera, projects gobo lighting patterns onto the scene, casting intricate shapes of light and shadow across the woman and the back wall. These dramatic shadows create a strong scense of depth and a theatrical quality. While some shadows are deep and defined, others remain soft, gently wrapping around the subject and preventing the loss of detail in darker areas. The soft focus on the background enhances the intimate feeling, drawing all attention to the expressive subject. The overall image presents a cinematic, photorealistic photography style." \
  --output ./output/output_offline_vae \
  --vae-use-tiling

No VAE

python vllm-omni/examples/offline_inference/hunyuan_image3/end2end.py \
  --model tencent/HunyuanImage-3.0-Instruct \
  --modality text2img \
  --deploy-config vllm-omni/vllm_omni/deploy/hunyuan_image3_dit.yaml \
  --prompts "A cinematic medium shot captures a single Asian woman seated on a chair within a dimly lit room, creating an intimate and theatrical atmosphere. The composition is focused on the subject, rendered with rich colors and intricate textures that evoke a nostalgic and moody feeling.\n\nThe primary subject is a young Asian woman with a thoughtful and expressive countenance, her gaze directed slightly away from the camera. She is seated in a relaxed yet elegant posture on an ornate, vintage armchair. The chair is upholstered in a deep red velvet, its fabric showing detailed, intricate textures and slight signs of wear. She wears a simple, elegant dress in a dark teal hue, the material catching the light in a way that reveals its fine-woven texture. Her skin has a soft, matte quality, and the light delicately models the contours of her face and arms.\n\nThe surrounding room is characterized by its vintage decor, which contributes to the historic and evocative mood. In the immediate background, partially blurred due to a shallow depth of field consistent with a f/2.8 aperture, the wall is covered with wallpaper featuring a subtle, damask pattern. The overall color palette is a carefully balanced interplay of deep teal and rich red hues, creating a visually compelling and cohesive environment. The entire scene is detailed, from the fibers of the upholstery to the subtle patterns on the wall.\n\nThe lighting is highly dramatic and artistic, defined by high contrast and pronounced shadow play. A single key light source, positioned off-camera, projects gobo lighting patterns onto the scene, casting intricate shapes of light and shadow across the woman and the back wall. These dramatic shadows create a strong scense of depth and a theatrical quality. While some shadows are deep and defined, others remain soft, gently wrapping around the subject and preventing the loss of detail in darker areas. The soft focus on the background enhances the intimate feeling, drawing all attention to the expressive subject. The overall image presents a cinematic, photorealistic photography style." \
  --output ./output/output_offline_vae

Test Result

VAE

No VAE

CLIP Score

99.85/100

Gaohan123

Here are some suggestions:

Please add simple UT for it
I didn't notice any modification about NPU, which is not consistent with your PR description

Fishermanykx · 2026-05-15T07:01:34Z

remove the NPU fused MoE init hook in vllm_omni/platforms/npu/models/hunyuan_fused_moe.py

done
remove the NPU fused MoE init hook in vllm_omni/platforms/npu/models/hunyuan_fused_moe.py this is done in pull 2979, which is not merged when this pr proposed. As I rebase my code, this change no longer exists in this pr.

Signed-off-by: KexiongYu <yukexiong1@huawei.com>

Gaohan123

LGTM. Thanks

…oencoder_kl_hunyuan.py to clarify what each case validates, without changing test behavior Signed-off-by: zzh <943967662@qq.com>

RuixiangMa · 2026-05-19T05:26:15Z

+logger = init_logger(__name__)
+
+
+class DistributedAutoencoderKLHunyuan(AutoencoderKLConv3D, DistributedVaeMixin):


missing from_pretrained, consistency suggests adding it

RuixiangMa · 2026-05-19T05:32:24Z

+    def encode_tile_exec(self, task: TileTask) -> torch.Tensor:
+        return self.encoder(task.tensor)
+
+    def encode_tile_merge(


tile_merge and encode_tile_merge are byte-for-byte identical. Could extract helper

RuixiangMa · 2026-05-19T05:33:19Z

+        torch.nn.Module.__init__(self)
+        self.tile_latent_min_size = 2
+        self.tile_sample_min_size = 2
+        self.tile_overlap_factor = 0.0


tile_overlap_factor hardcoded to 0.0 (real default 0.25), blend logic never tested.

autoencoder_kl_hunyuan imports AutoencoderKLConv3D from the hunyuan_image3 package, which triggers hunyuan_image3/__init__.py to execute and import pipeline_hunyuan_image3, which in turn imported DistributedAutoencoderKLHunyuan back from autoencoder_kl_hunyuan before it finished initializing, causing a circular import error during test collection. Fix by moving the top-level import of DistributedAutoencoderKLHunyuan into HunyuanImage3Pipeline.__init__ as a lazy import, so it is only resolved at call time when both modules are fully initialized. Signed-off-by: zzh <943967662@qq.com>

…an and deduplicate tile merge - Add missing from_pretrained classmethod for consistency with other distributed autoencoders (KL, Wan, QwenImage) - Delegate encode_tile_merge to tile_merge to eliminate byte-for-byte duplicate code Signed-off-by: zzh <943967662@qq.com>

…AE tests Adjust grid_shape from (2,2) to (4,4) and tile count from 4 to 16. When tile_overlap_factor=0.25, overlap_size becomes 1 instead of 2, producing a denser 4x4 tile grid on the 4x4 input. Signed-off-by: zzh <943967662@qq.com>

…AE tests With tile_latent_min_size=2 and tile_overlap_factor=0.25, blend_extent truncates to int(0.5)=0, causing overlapping tiles with no blending and producing misaligned 7x7 output instead of the expected 4x4. Increasing min_size to 8 makes blend_extent=2 and keeps the tile pipeline's math self-consistent while preserving tile_overlap_factor=0.25. Signed-off-by: zzh <943967662@qq.com>

Fishermanykx force-pushed the yukexiong/hunyuan_vae_opt branch 2 times, most recently from 421d557 to c69899e Compare April 24, 2026 03:36

Fishermanykx changed the title ~~[WIP][Feat.] Enable VAE parallel in HunyuanImage3~~ [Feat.] Enable VAE parallel in HunyuanImage3 Apr 24, 2026

Fishermanykx marked this pull request as ready for review April 24, 2026 03:36

Fishermanykx requested a review from hsliuustc0106 as a code owner April 24, 2026 03:36

Fishermanykx changed the title ~~[Feat.] Enable VAE parallel in HunyuanImage3~~ [Feat] Enable VAE parallel in HunyuanImage3 Apr 24, 2026

Fishermanykx force-pushed the yukexiong/hunyuan_vae_opt branch 2 times, most recently from ee9b0b3 to a4502c4 Compare April 24, 2026 07:23

Fishermanykx force-pushed the yukexiong/hunyuan_vae_opt branch 4 times, most recently from a4fc4ec to 378289a Compare April 30, 2026 02:50

wtomin mentioned this pull request May 7, 2026

[RFC]: Continuous Diffusion Model Acceleration Support #1217

Open

1 task

Bounty-hunter mentioned this pull request May 10, 2026

[RFC]: HunyuanImage Model deployment optimization #2015

Open

Bounty-hunter approved these changes May 10, 2026

View reviewed changes

Fishermanykx force-pushed the yukexiong/hunyuan_vae_opt branch from 378289a to 2eacaf2 Compare May 11, 2026 12:03

Fishermanykx requested review from Isotr0py, RuixiangMa, SamitHuang, ZJY0516, david6666666, princepride and wtomin as code owners May 11, 2026 12:03

Fishermanykx force-pushed the yukexiong/hunyuan_vae_opt branch 5 times, most recently from ae99ecc to c6f0e06 Compare May 14, 2026 02:42

Fishermanykx force-pushed the yukexiong/hunyuan_vae_opt branch from c6f0e06 to 8c4b866 Compare May 14, 2026 09:13

Gaohan123 added this to the v0.22.0 milestone May 14, 2026

Gaohan123 reviewed May 14, 2026

View reviewed changes

Fishermanykx requested a review from yenuo26 as a code owner May 15, 2026 07:02

Fishermanykx force-pushed the yukexiong/hunyuan_vae_opt branch from f338f4d to 014b54b Compare May 15, 2026 07:05

Fishermanykx added 2 commits May 15, 2026 16:46

[WIP][Feat.] Enable VAE parallel in HunyuanImage3

4e17c90

Signed-off-by: KexiongYu <yukexiong1@huawei.com>

[UT] Add Hunyuan distributed VAE tests

272bc98

Signed-off-by: KexiongYu <yukexiong1@huawei.com>

Fishermanykx force-pushed the yukexiong/hunyuan_vae_opt branch from 014b54b to 272bc98 Compare May 15, 2026 08:46

Gaohan123 added the ready label to trigger buildkite CI label May 18, 2026

Gaohan123 approved these changes May 18, 2026

View reviewed changes

Gaohan123 enabled auto-merge (squash) May 18, 2026 09:26

Gaohan123 disabled auto-merge May 18, 2026 09:49

Add concise per-test comments in tests/diffusion/distributed/test_aut…

61fc92a

…oencoder_kl_hunyuan.py to clarify what each case validates, without changing test behavior Signed-off-by: zzh <943967662@qq.com>

BLANKETusers force-pushed the yukexiong/hunyuan_vae_opt branch from 4f1aa1b to 61fc92a Compare May 19, 2026 03:54

RuixiangMa reviewed May 19, 2026

View reviewed changes

BLANKETusers added 5 commits May 19, 2026 14:22

Gaohan123 merged commit 2917959 into vllm-project:main May 20, 2026
7 of 9 checks passed

		logger = init_logger(__name__)


		class DistributedAutoencoderKLHunyuan(AutoencoderKLConv3D, DistributedVaeMixin):

Conversation

Fishermanykx commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Test Plan

Test Result

Uh oh!

Fishermanykx commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsliuustc0106 commented Apr 24, 2026

Uh oh!

Bounty-hunter left a comment

Choose a reason for hiding this comment

Uh oh!

BLANKETusers commented May 14, 2026

Test Plan

Test Result

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Fishermanykx commented May 15, 2026

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

RuixiangMa May 19, 2026

Choose a reason for hiding this comment

Uh oh!

BLANKETusers May 19, 2026

Choose a reason for hiding this comment

Uh oh!

RuixiangMa May 19, 2026

Choose a reason for hiding this comment

Uh oh!

BLANKETusers May 19, 2026

Choose a reason for hiding this comment

Uh oh!

RuixiangMa May 19, 2026

Choose a reason for hiding this comment

Uh oh!

BLANKETusers May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Fishermanykx commented Apr 24, 2026 •

edited

Loading

Fishermanykx commented Apr 24, 2026 •

edited

Loading